Войти
  • 2955Просмотров
  • 5 месяцев назадОпубликованоNeural Magic

[vLLM Office Hours #27] Intro to llm-d for Distributed LLM Inference

In this session, we explored the latest updates in the vLLM release, including the new Magistral model, FlexAttention support, multi-node serving optimization, and more. We also did a deep dive into llm-d, the new Kubernetes-native high-performance distributed LLM inference framework co-designed with Inference Gateway (IGW). You'll learn what llm-d is, how it works, and see a live demo of it in action. Session slides: Join our bi-weekly vLLM office hours: